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Abstract 

Background: Prognostic models have clinical appeal to aid therapeutic decision making. Two main practical 
challenges in development of such models are assessment of validity of models and imputation of missing data 
In this study, importance of imputation of missing data and application of bootstrap technique in development, 
simplification, and assessment of internal validity of a prognostic model is highlighted. 
Methods: Overall, 310 breast cancer patients were recruited. Missing data were imputed 10 times. Then to deal 
with sensitivity of the model due to small changes in the data (internal validity), 100 bootstrap samples were 
drawn from each of 10 imputed data sets leading to 1000 samples. A Cox regression model was fitted to each of 
1000 samples. O nly variables retained in more than 50% of samples were used in development of final model. 
Results: Four variables retained significant in more than 50% (i.e. 500 samples) of bootstrap samples; tumour 
size (91%), tumour grade (64%), history of benign breast disease (77%), and age at diagnosis (59%). Tumour size 
was the strongest predictor with inclusion frequency exceeding 90%. Number of deliveries was correlated with 
age at diagnosis (r=0.35, P<0.001). These two variables together retained significant in more than 90% of sam- 
ples. 

Conclusion: We addressed two important methodological issues using a cohort of breast cancer patients. The 
algorithm combines multiple imputation of missing data and bootstrapping and has the potential to be applied in 
all kind of regression modelling exercises so as to address internal validity of models. 
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Introduction 

Multifactorial regression models are frequently 
used in medicine to develop prediction tools. De- 
velopment of multifactorial models needs careful 
considerations. Two main issues, which are of 
crucial importance, are two select variables for 
final multifactorial model and avoiding bias esti- 
mates due to missing data. 
The first issue discussed here is selection of vari- 
ables to be contributed to the multifactorial model. 
In development of regression models, researchers 



usually apply stepwise variable selection proce- 
dures such as Backward Elimination (B.E.) and 
Forward Selection (F.S.). However, such methods 
suffer lack of stability. This is because the inclu- 
sion or exclusion of a few cases can affect the va- 
riables selected for the model and resulting pa- 
rameter estimates (1-3). 

This issue has been addressed in the development 
of a prediction model for acute myocardial infrac- 
tion mortality. In 1000 bootstrap samples, B.E. 
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produced 940 unique models (2). Out of 29 vari- 
ables, only three variables were significant in all 
the bootstrap samples, 18 variables were selected 
in fewer than half of the bootstrap samples, and 
six variables in less than 10%. This demonstrates 
the sensitivity of B.E. to small differences be- 
tween bootstrap samples. It has therefore been 
recommended to use B.E. in conjunction with 
bootstrap procedure (4-6). That is, to apply B.E. 
to a number of bootstrap samples (typically 100) 
and then to check selection of variables across 
samples (known as inclusion frequency or per- 
centage). 

It has therefore been recommended to use B.E . in 
conjunction with bootstrap procedure (4-6). That 
is, to apply B.E . to a number of bootstrap samples 
(typically 100) and then to check selection of vari- 
ables across samples (known as inclusion fre- 
quency or percentage). 

The second issue is missing data. The simplest 
approach, to tackle missing data, is to exclude cas- 
es with missing data. Case-wise (list-wise) deletion 
means to omit all records that contain missing da- 
ta for any variable. Pair-wise deletion method, on 
the other hand, uses a correlation matrix where 
correlation between each pair of variables is calcu- 
lated from all cases that have valid data for those 
two variables. This method seems better than 
case-wise deletion but the problem is that the pa- 
rameters of the model will be based on different 
sets of data, with different sample sizes and differ- 
ent standard errors. Therefore, the resulting corre- 
lation matrix may not be suitable for further anal- 
ysis such as regression models. Disadvantages of 
C-C analyses are highlighted elsewhere (7, 8). Fur- 
thermore, this method does not work when the 
aim is to develop a multifactorial regression model 
to adjust effect of variables in presence of other 
covariates. 

On the other hand, the Multivariable Imputation 
via Chained Equations (MICE) method can be 
used to impute the missing values (7, 9). Superior- 
ity of this method over other imputation algo- 
rithms has been addressed elsewhere (10). 
Majority of studies illustrated usefulness of MICE 
to tackle missing data, and bootstrapping to ad- 
dress internal validity of regression models. How- 
ever, very few studies combine them together to 



provide a larger picture of uncertainties that can 
happen (i.e. uncertainties due to imputation of 
missing data, and sampling variations). A recent 
study recommended the issue of combining these 
two methods in prognostic studies (11). 
Using a breast cancer data set, we already illus- 
trated different methodological issues (12). In a 
recent work, we addressed the process of the 
MICE method and its superiority over the Com- 
plete Case (C-C) analysis (9). In other words, we 
performed and reported results of majority of im- 
putation methods, including the MICE model, 
elsewhere. 

The main aim of this paper was to combine these 
two steps (MICE and bootstrapping) to develop a 
prognostic model. 

Materials and Methods 

Patients and outcome 

Study sample comprised of 310 breast cancer pa- 
tients in Shiraz (southern Iran) of which 56 cases 
died due to breast cancer. Data were collected 
from Hospital-based Cancer Registry of Nemazee 
Hospital affiliated to Shiraz University of Medical 
Sciences. To secure confidentiality of patients, the 
data set does not include personal information 
such as name, address, email, or phone number. 
Median follow-up time was 2.5 years. 

Variables 

Variables offered to the multifactorial models 
were tumour stage with 3 levels (early, locally ad- 
vanced, and advanced), tumour grade with 3 levels 
(1, 2, and 3), history of benign breast disease (pos- 
itive versus negative), age at diagnosis <=48 ver- 
sus > 48), treatment option (lumpectomy versus 
mastectomy), and number of deliveries. 

Imputation of missing data to tackle missing 
data 

For all candidate variables, we imputed missing 
data using the MICE method. We imputed 10 val- 
ues for each missing value, thus creating 10 im- 
puted data sets (13). 
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Selection of bootstrap samples to check inter- 
nal validity 

To circumvent the risk of an over-fitted model, 
and to check the internal validity of the model, we 
have employed bootstrap sampling to refine the 
models by excluding variables with unreliably in- 
cluded as necessary for prediction. We drew 100 
bootstrap samples from each of 10 imputed data 
sets, giving 1000 data sets in total. 

Multifactorial model 

When modelling across bootstrap samples, the 
prognostic variables that truly are important 
should be retained in most models fitted. This is 
because each bootstrap replication is a random 
sample that should therefore reflect and mimic the 
underlying structure of the data, and it is this 
should drive the variables needed in the majority 
of models fitted (4-6). In other words, bootstrap 
technigue can be used as a measurew of internal 
validity. Therefore, a measure of inclusion fre- 
guency can be used to screen for the selection of 
the variables (1, 14). It has been shown that the 
inclusion of a variable in the model at selection 
levels of 1% and 5% in the original data can be 
checked against a cut of value for the bootstrap 
inclusion fraction of 73% and 50% respectively (1). 
Therefore, only variables retained in more than 
50% of samples (i.e. 500 out of 1000 samples) 
were selected to construct the final model. 
It should be added that when independent vari- 
ables are correlated, if the inclusion freguency of 
correlated variables together exceeds 90%, then 
the one with higher inclusion freguency should be 
offered to the model. Otherwise, both should be 
omitted (1, 14). 

Aggregation of results 

Using only the reliable variables identified in the 
previous step (i.e. those retained significant in 
>50% of samples), a final model was then fitted 
to each of the 1000 samples. Applying Rubin's 
rule, coefficients for these 1000 models were then 
averaged across models, and standard errors com- 
bined (8, 9). Final Hazard Ratios (HR) was then 
estimated applying exponential transformation to 
the aggregated coefficients. 



Results 

The inclusion freguency for all 6 variables offered 
to multifactorial models is given in Table 1. In 
total four variables were retained significant in 
more than 50% of samples: tumour stage, tumour 
grade, history of benign disease, and age at diag- 
nosis time. The final multifactorial model for 
Breast Cancer Specific Death (BCSD) retained 
four variables (Table 1), together with aggregated 
hazard ratios (as described in Methods). 
Tumour stage seems the most important predictor 
of breast cancer specific death. This variable sig- 
nificantly contributed to the model in more than 
90% of replications. 

History of benign disease was the second most 
important predictor. This variable retained in 
nearly three-forth of replications. The Hazard Ra- 
tio (HR) of breast cancer specific death for those 
with positive history of benign breast was 2.4 
(95% C.I.: 1.25, 4.19) times higher than others. 
Tumour grade was contributed to about 60% of 
samples, and therefore used to develop final mod- 
el. However, if one wishes to use 0.01 as signifi- 
cance level, then this variable should not be of- 
fered to the final model, since its inclusion fre- 
guency is lower than 73% . 
The inclusion freguency of age at diagnosis and 
number of deliveries were 59% and 32% respec- 
tively. The spearman correlation between these 
two variables was 35% (P<0.001). This means that 
umber of deliveries can be used as a surrogate ap- 
proximation for age variable. 
However, when both variables were offered to the 
models, age variable was retained in majority of 
samples (59% versus 32%). Following recom- 
mendations given in the methods section, we used 
the age variable in the final modelling. 
Patients in the early stage underwent either lum- 
pectomy or mastectomy but in other stages, only 
mastectomy was done. This variable retained sig- 
nificant in small proportion of samples. 
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Table i Multifactorial Breast Cancer Specific Death 
(BCSD) model; relative frequency of covariate inclu- 
sion (in 1000 bootstrap samples drawn from 10 im- 
puted data sets) and estimated HR's 
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Discussion 

By imputing multiple data sets (to avoid attrition 
in sample size due to presence of missing data) 
followed by bootstrapping (to check the reliability 
of inclusion across models (know as internal valid- 
ity)), we applied a methodology which has the po- 
tential for future application in all medical areas. 
The approach applied has the advantages that 
takes into account both imputation and sampling 
variations in to account. We offered variables with 
inclusion freguency of at least 50%. This should 
be added that when the aim is to fit a parsimony 
model, then only variables with very high inclu- 
sion freguency should be retained. On the other 



hand, when adjustment for covariates is the aim, 
selection of variables with low inclusion freguency 
is necessary and therefore, a low value for per- 
centage of inclusion freguency should be selected 
(1). 

Here we combined MICE and bootstrap. How- 
ever, we have not made any comparison with C-C 
model, and MICE model without bootstrap. This 
is because those analyses have been published 
elsewhere and are not reported here (12). Our 
previous results showed that C-C provides biased 
estimates. However, in terms of variables contrib- 
uted to the final model, results of MICE model 
without bootstrap (presented in (9)) were the 
same as results presented here. This might par- 
tially be explained by the fact that here we esti- 
mated eight regression coefficients. Therefore, 
ratio of number of Event Per Variable (EPV) was 
an acceptable figure (56/8 or 7). It has been 
shown that the lower the EPV the more unstable 
the model. 

Ultimately, the most important issue for a model 
is its external validity, the extent to which it pro- 
vides good predictions for similar patients who 
were not involved in the development of the 
model. However, before external validity can be 
checked, it is a prereguisite that there is adeguate 
internal validity. Internal validation refers to the 
performance in patients from a similar population 
to those comprising the sample on which the 
model was developed. Therefore, internal validity 
is in contrast to external validity, where different 
populations are used to develop and test the mod- 
el (15). 

G enerally, internal validity can be investigated by 
splitting the data into training and test samples, 
doing cross-validation, or performing a bootstrap 
resampling procedure (5, 16). 
The data-splitting approach allows the hypothesis 
tests to be confirmed in the test sample. However, 
this method leads to a lower sample size, and con- 
seguently lower power, in training and test sam- 
ples. 

The Cross-validation method randomly divides 
the data, several times, as training and test samples, 
and applies the results obtained in training test-to- 
test sets. The use of leave-one-out cross-validation 
allows the researcher to build the model on N-l 
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cases (almost all) and then test it on the case that 
was left out. This method allows developing the 
model without scarifying the sample size. How- 
ever, it is often the case, that the criterion to com- 
pare the performance of validation technigues 
could not be calculated for one case (17). 
Alternatively, one can do 10-fodl cross-validation. 
Similar to data- splitting method, this technigue 
may not be accurate if the training or test set is 
too small (18). However, when total sample size is 
310, then each of test samples only formed by 31 
individuals. Inference based on such small num- 
ber of cases may not be robust. 
On the other had, the bootstrap technigue does 
not scarify the sample size and allows the re- 
searcher to be able to extract as much information 
as possible (19). It has been shown that, to assess 
internal validity, the bootstrap would be the best 
approach (17). 

Assessment of internal validity of a model is an 
important issue. As an example, Chen et al. devel- 
oped a prognostic model. To check the internal 
validity of the model they drew 100 bootstrap 
samples but the original model was seen just in 
2% of replications (20). However, bootstrap inclu- 
sion freguencies of five of the six variables from 
the original model were between 64% and 82%. 
In another study in node positive breast cancer 
patients, the final prognostic model has not been 
seen in any of 200 bootstrap replications (21). 
These two examples show importance of assess- 
ment of internal validity of a model, to deal with 
sampling variation. 

Here we demonstrated application of bootstrap 
technigue in assessment of internal validity of 
models. We should add that bootstrap method has 
other methodological usages as well. For example, 
one can apply this technigue to provide confi- 
dence interval for any parameter. For example, 
assume that the aim is to estimate the mean of a 
continuous variable in the population. We can es- 
timate the mean from a sample and then apply the 
normal approximation formula to provide the 
confidence interval. However, if the data violated 
the distributional assumption, then this approxi- 
mation should be avoided. In that case, it is possi- 
ble to draw bootstrap samples and calculate boot- 
strap mans from each sample. Then sorting values, 



percentiles on 2.5 and 97.5 can be used as lower 
and upper bounds on confidence interval. 
It has been shown that internal validity may not 
be sufficient for the good performance of the 
model in case of future patients. External valida- 
tion is essential before implementation of predic- 
tion models in clinical practice (22). It has been 
emphasized that usefulness of a model is deter- 
mined by how well it works in practice and not by 
how many zeros there are in the associated P val- 
ues in the multifactorial model (15, 23). 
The importance of assessment of external validity 
is illustrated here by an example. A prediction 
model for the presence of serious bacterial infec- 
tions in children with fever without source was 
derived (22). The discrimination (C -index) and 
predictive ability (R-sguare) of the model was 0.83 
and 32% respectively. The model was then vali- 
dated in an independent sample (n=179) giving 
discrimination ability of 0.57 (0.47-0.67) and R- 
sguare of 20%. 

We combined multiple imputation and bootstrap- 
ping methods to develop a prognostic model, 
which predicts BCSD. This algorithm has been 
applied here as well (24). The MICE method ap- 
plied has the advantage that takes into account the 
imputation variation. Furthermore, bootstrap ap- 
proach allows checking the sensitivity of model to 
small changes in the sample. Although bootstrap 
method is a powerful technigue to check the inter- 
nal validity of a model, an independent and fresh 
data set is needed to investigate transportability of 
the model (15). 

Ethical considerations 

Ethical issues (Including plagiarism, Informed 
Consent, misconduct, data fabrication and/ or fal- 
sification, double publication and/ or submission, 
redundancy, etc) have been completely observed 
by the authors. 

Acknowledgments 

The data set analyzed in this project was collected 
under the direction of Professor Talei at Shiraz 
University of Medical Sciences. No funding was 
received for this study. The authors declare that 
there is no conflict of interests. 



114 



Baneshi & Talei.: Assessment of Internal Validity of Prognostic Models ... 



References 

1. Sauerbrei W, Schumacher M (1992). A bootstrap re- 

sampling procedure for model building: applica- 
tion to the Cox regression model. Stat Med, 11 
(16): 2093-109. 

2. Austin PC, Tu JV (2004). Automated variable se- 

lection methods for logistic regression produced 
unstable models for predicting acute myocardial 
infarction mortality. J Clin Epidemid, 57 (11): 
1138-46. 

3. Derksen S, Keselman J (1992). Backward, forward, 

and stepwise automated subset selection al- 
gorithms: frequency of obtaining authentic and 
noise variables. British Journal of Mathematical and 
Statistical Ps^didooy, 45 (2): 265-82. 

4. Altaian DG, Andersen PK (1989). Bootstrap in- 

vestigation of the stability of a Cox regression 
model. Stat Med, 8 (7): 771-83. 

5. Harrell FE, Lee KL, Mark D B (1996). Multivariate 

prognostic models: issues in developing models, 
evaluating assumptions and adequacy, and meas- 
uring and reducing errors. Stat M ed, 15 (4): 361-87. 

6. SteyerbergEW,BleekerSE,MonHA,GrobbeeDE, 

Moons KG (2003). Internal and external valida- 
tion of predictive models: a simulation study of 
bias and precision in small samples J Clin E pide- 
miol,56(5):441-7. 

7. Baneshi MR, Talei AR (2010). Impact of imputation 

of missing data on estimation of survival rates: an 
example in breast cancer. Iranian Journal of Cancff 
Proaiticn,3(3): 127-31. 

8. Baneshi MR, Faramarzi H, Marzban M (2012). Pre- 

vention of disease complications through diag- 
nostic models how to tackle the problem of 
missing data? Iranian J Publ H ealth, 14 (1): 66-72. 

9. Baneshi MR, Talei AR (2011). Multiple Imputation 

in Survival Models: Applied on Breast Cancer 
Data Iranian Journal of Canur Prevention, 13 (8): 
547-52. 

10. Baneshi MR, Talei AR (2012). Does the Missing 

Data Imputation Method Affect the Composi- 
tion and Performance of Prognostic Models? Ira- 
nian Red C resent Medial Journal, 14 (1): 31-6. 

11. HeymansMW, VanBuurenS, KnolDL, VanMe- 

chelen W, de Vet HC (2007). Variable selection 
under multiple imputation using the bootstrap in 
aprognostic study. BMC M ed Res Methodol, 7:33. 



12. Rajaeefard AR, Baneshi MR, Talei AR, Mehrabani 

D (2009). Survival Models in Breast Cancer. Ira- 
nian Red Crescmt Medical Journal, 11 (3): 295-300. 

13. SchaferJL (1999). Multiple imputation: aprimer. Stat 

MetriodsMedRes,8(l):3-15. 

14. Austin PC, Tu JV (2004). Bootstrap methods for 

developing predictive models in cardiovascular 
research. A medcan Statisticians, 58: 131-7. 

15. Justice AC, Covinsky KE, Berlin JA (1999). As- 

sessing the generaBzablty of prognostic informa 
tion. A nn Intern Med, 130 (6): 515-24. 

16. Harrell FE, Lee KL, Matchar DB, Reichert TA 

(1985). Regression models for prognostic predic- 
tion: advantages, problems, and suggested solu- 
tions. Canar Treat Rep, 69 (10): 1071-7. 

17. Steyerberg EW, Harrell FE, Jr., Borsboom G J et al 

(2001). Internal validation of predictive models: 
efficiency of some procedures for logistic regres- 
sion analysis. J Clin E pidemiol, 54 (8): 774-81. 

18. Azuaje F (2003). G enomic data sampling and its ef- 

fect on classification performance ass-essment 
BMC Bicmformati(5,28:4-5. 

19. Sauerbrei W, Royston P (2007). Modelling to extract 

more information from clinical trials data On 
some roles for the bootstrap. Stat Med, 26 (27): 
4989-5001. 

20. Chen CH, George SL (1985). The bootstrap and 

identification of prognostic factors via Cox's pro- 
portional hazards regression model. Stat Med, 4 
(1): 39-46. 

21. Sauerbrei W, Royston P (1999). Building multivari- 

ate prognostic and diagnostic models: transfor- 
mation of the predictors by using fractional poly- 
nomials. Journal of Royal Statistical Sodety, 162 (1): 
71-94. 

22. Bleeker SE, Moll HA, Steyerberg EW et al. (2003). 

External validation is necessary in prediction re- 
search: a clinical example. J Clin E pidemiol, 56 (9): 
826-32. 

23. Altaian D G , Royston P (2000). What do we mean 

by validating a prognostic model? Stat Med, 19 
(4): 453-73. 

24. Baneshi MR, Warner P, Anderson N, Edwards J, 

Cooke TG , BartlettJMS (2010). Tamoxifen resis- 
tance in earfy breast cancer statistical modelling 
of tissue markers to improve risk prediction. Br J 
Cane?, 102: 1503-10. 



115 



